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Abstract. Adaptive codes associate variable-length codewords to symbols being 
encoded depending on the previous symbols in the input data string. This class 
of codes has been introduced in as a new class of non-standard variable-length 
codes. New algorithms for data compression, based on adaptive codes of order one, 
have been presented in |10|. where we have behaviorally shown that for a large class 
of input data strings, these algorithms substantially outperform the Lempel-Ziv 
universal data compression algorithm |18) . EAH has been introduced in as an 
improved generalization of these algorithms. In this paper, we present a translation 
of the EAH algorithm into the graph theory. 
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1. Introduction 

New algorithms for data compression, based on adaptive codes of or- 
der one have been presented in |1U| . where we have behaviorally 
shown that for a large class of input data strings, these algorithms 
substantially outperform the well-known Lempel-Ziv universal data 
compression algorithm |18j . 

EAH (Encoder based both on Adaptive codes and the Huffman al- 
gorithm) has been proposed in [llj . as an improved and generalized 
version of the algorithms presented in ^JJl- The work carried out so 
far jlUl II lj has behaviorally proved that EAH is a very promising 
data compression algorithm, as one can remark again in the examples 
presented in the following sections. In this paper, we translate the EAH 
encoder, as well as some of the results obtained so far, into the graph 
theory. 

In the remainder of this section, we recall the basic notions and 
notations used throughout the paper. We denote by |iS| the cardinality 
of the set S; if x is a string of finite length, then \x\ denotes the length 
of x. The empty word is denoted by A. Let us denote by A* the set 
U£°=o A " and by A+ the set U~=i A n . Also, denote by A^ n the set 
Ur=o Ai and W A "" the set U=„ Ai , where A denotes the set {A}. N 
denotes the set of natural numbers, and N* = N — {0}. 

* The research reported in this paper was partially supported by CNCSIS grant 
632/2004. 
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Let X be a finite and nonempty subset of A + , and w € A + . A 
decomposition of w over X is any sequence of words u\ , u 2 , . . . , with 
Ui £ X , 1 < i < h, such that w = U\u<i . . . Uh- A code over A is any 
nonempty set C C A + such that each word w € A + has at most one 
decomposition over C. A prefix code over A is any code C over A such 
that no word in C is proper prefix of another word in C. 

2. Adaptive Codes: A Short Review 

The aim of this section is to briefly review some basic definitions, 
results, and notations directly related to adaptive codes. 

DEFINITION 1. Let E, A be alphabets. A function c:Sx A+, 
n > 1, is called adaptive code of order n if its unique homomorphic 
extension c : S* — > A* ; defined by: 

• 5(A) = A 

• c(aia 2 ■ ■ ■ <T m ) = c(ai,X) c(<t 2 ,<ti) ... c(a„_i, cti<t 2 . . . 0-„_ 2 ) 
c(cr„, cricr 2 . . . cr n _i) c(cr n+ i, 0i<7 2 . . . <r n ) c(cr n+2 , cr 2 CT3 . . . <T n+ i) 
c(<7 n+3 , (73CJ4 . . . a n+2 ) ... c(a m> 0~m—nO~m—n+l ■ ■ ■ 0~m—l ) 

for all a\CT2 ■ ■ ■ o~ m £ T, + , is injective. 

As specified by the definition above, an adaptive code of order n as- 
sociates variable-length codewords to symbols being encoded depending 
on the previous n symbols in the input data string. 

EXAMPLE 1. Let us consider S = {a,b,c} and A = {0,1} two al- 
phabets, and c : S x S- 2 — > A + a function constructed by the following 
table. One can easily verify that c is injective, and according to DEF. 
1, c is an adaptive code of order two. 



Table I. An adaptive code of order two. 
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00 
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01 


00 
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00 
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11 


00 


00 


00 
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10 



Let x = abacca G S + be an input data string. Using the definition 
above, we encode x by: 

c(x) = c(a, X)c(b,a)c(a,ab)c(c,ba)c(c,ac)c(a,cc) = 001011111100. 
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Let c : £ x £- n — ► A + be an adaptive code of order n, n > 1. 
We denote by C c ^ 1<T2 ,,, ah the set {c(cr, a\02 . . . 07J | a G £}, for all 
cji(J2 . . . cr/j G £- — {A}, and by C Ci a the set {c(cr, A) | a G £}. We write 
C<ti<72...o7, instead of C C)aia2 ___ ah , and Ca instead of C C) \ whenever there is 
no confusion. 

Let us denote by AC(T,, A, n) the set {c : S x S- — > A + | c is an 
adaptive code of order n}. The proof of the following theorem can be 
found in !) . 

THEOREM 1. Let £ and A 6e two alphabets, and c : E x X^ n -» A+ 
a function. If C u is prefix code, for all u € T,- n , then c G ^4C(E, A, n). 



3. Modelling the EAH Encoder using the Graph Theory 

This section focuses on modelling some of the results obtained so far, 
as well as the EAH data compression algorithm using the graph 
theory. 



DEFINITION 2. Let S be an alphabet, and w = uiu 2 ...u h G £+, 
where Ui G S /or aZZ i, 1 < 2 i < /i. The adaptive graph of order n asso- 
ciated to w is a 3-tuple G n {w) = (V, E, f), where /^^NxfO.l}* 
is a function, and (V, E) is a directed graph defined as below. 

• V = Vbase U Vaux. 

• E = Ebase U Eaux. 



Vbase 



• Vaux 



Ebase 



Eaux 



ifh<n. 
{uju j+ i . . . u j+n ^i I 1 < j < h - n} 

L){uj j n + 1 < j < h} if h > n. 



{uj-aux I j < h — 1 and Uj 



ifn>2. 
u j+1 } ifn=\. 



ifh<n. 

{{uj . . . Uj +n ^\,Uj +n ) I j > 1} if h > n and n > 2. 

{(uj, Uj+i) I Uj 7^ %+i} if h > n and n = 1. 

ifh<n. 
{(uj,Uj-aux) I Uj = 

U{(tij_au3;, Uj) I Uj = Uj + i} if h > n = 1. 

{{uj , Uj_ n+ i ... iij) I n + 1 < j < Zi — 1} if h > n > 2. 
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For all e G E, if /(e) = (x, y) then x is defined as follows. 



\{i | Ui . . . Ui+ n — Uj . . . v , 
\{i | Ui = u i+ i = Uj}\ 





*}| ife 

if e = (uj,Uj-aux). 
if e = (uj_aux, Uj). 
if e = (uj,Uj- n+1 ...uj). 



Let us introduce some basic notations used throughout the section, 
and then give a few examples of adaptive graphs. Let S be an alphabet, 
w G S + , G n {w) = (V, E, /), and x G V = Vbase U Vaux. 

• input^ n ^ w \x) = {(y,x) | y G F6ase}. 

• inputa"^ W \x) = {(y,x) | j/ G Vaux}. 

• outputf n( ' w \x) = {(x,y) | y G F&ase}. 

• outputa n( ~ w \x) = {(x,y) | y G Vaux}. 
. if»M(a;) = |mputf" (,0) (x)|. 

• o° n{w \x) = \output^ n{w \x)\. 
. i^ (w \x) = \input^ {w \x)\. 

. o^ (u,) (x) = |(W<puta" (,0) (x)|. 

EXAMPLE 2. Lei £ = {a, 6, c, d} 6e an alphabet, w = abccdbbab G S + , 
and = (V, E, f) the adaptive graph of order one associated to w. 

It is easy to verify that we get the following results. 

• V = {a, b, c, d, C-aux, b_aux}. 

• E = {(a, b), (b, a), (6, b_aux), (b-aux, b), (6, c), (c, c_aux), (c, d), 
(c_aux, c), (d, 6)}. 



(i.i) 



(c_anx)- 



(0,A) 



Figure 1. Gi (abccdbbab). 



& 



(1,0) 



(2,0) 
(1,10) 



(1,0) 



(0,A) 



(1,11) 



'Jb_aux) 
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Let U = (u\, u 2 , ■ ■ ■ , Uk) be a fc-tuple. We denote by U.i the i-th 
component of U, that is, u\ = U.i, for all i, 1 < i < k. The 0-tuple 
is denoted by (). The length of a tuple U is denoted by Len(U). If 
V = (vi,v 2 , ■ ■ -,v b ), M = (mi,m 2 , . . . , m r , U), N = (n 1 ,n 2 , ...,n s , V), 
P = (pi) • • ■ ,Pi-iiPiiPi+ii ■ ■ ■ iPt) are tuples, and q is an element or a 
tuple, then we define P <\ q, P > i, U A V, and MQ/N by: 

• P <q=(p l ,...,p t ,q) 

• P\>i = (p 1 ,...,p i _ 1 ,p i+1 ,..., Pt ) 

• U A V = (u!,u 2 , ■ ■ ■ ,u k ,vi,v 2 , ■ ■ ■ ,v b ) 

• MON = (mi + m, m 2 + 1, . . . , m r + 1, n 2 + 1, . . . , n s + 1, U A V) 

where rrii, rij are integers, 1 < i < r and 1 < j < s. Let us denote by 
Huffman the Huffman algorithm, which gets as input a tuple (/i, . . . , fk) 
of integers, and returns a tuple ((ci, Zi), . . . , (c^, where Cj is the 
codeword associated to the symbol with the frequency fi, and is the 
length of Ci for all i, 1 < i < k. 

Huffman(tuplc T = (/i, . . . , /fe), where > 1) 

1. C := ((/i, 0, (1)), . . . , (f k , 0, (fc))); Let V = (A, A, . . . , A) be a fe-tuple; 

2. if fc = 1 then V.l := 0; 

3. while Len(C) > 1 do 
begin 

3.1. Let i < j be such that 1 < i,j < Len(C) and C.i.l, Cj.l are 
the smallest elements of the set {Cq.l \ 1 < q < Len(£)}; 

3.2. Firsi := {C.i.Len{C.i).r j 1 < r < Len{C.i.Len(C.i))}; 

3.3. Second := {Cj.Len(C.j).r | 1 < r < Len(C.j.Len(jC.j))}; 

3.4. for each a; £ Firsi do V.a; := • V.£; 

3.5. for each x £ Second do V.x := 1 • V.x; 

3.6. W := Ci C.j; C := C > j; C := £ > i; £ := C < U\ 
end 

4. return ((V.l, |V.1|) (V.fc, |V.fc|)); 

Figure 2. The Huffman algorithm. 

Let E = {o"o, o"i, . . . , cj m _i} be an alphabet, and m € {1, 2, . . . , 256}. 
Let us explain the idea of our algorithm, denoted by EAHn. For exam- 
ple, let w = w\w 2 . . . Wh be a string over E. We encode wbya 5-tuple 
C/ = (A, B, C, D, E), where A, B, C,D,E G {0, 1} + are constructed as 
follows. 

1. EAHn(u;).l = A. Let Idx : E — > {0, 1, . . . , m — 1} be a function 
which gets as input a symbol cr G E and outputs an index % such 
that a = <Tj. If /i > n, then ^ = 61062 (Idx (toi)) . . . 61062(Mc(u;„)), 
that is, A is the conversion of the sequence Idx (w± ),..., Idx{w n ) 
from base 10 to base 2, and |A| = n * [log 2 m]. Otherwise, if h < n, 



EAH-gt.tex; 1/02/2008; 20:29; p. 5 



6 Drago§ N. Trinca 

then we consider A = bl0b2(Idx(wi)) . . . bl0b2(Idx(wh)), that is, A 
is the conversion of Idx(w\), . . . ,Idx(wh) from base 10 to base 2, 
and \A\ = h* [~log 2 m~\ . 

2. EAHn(u>)-2 = B = B B 1 . . . B m ™_i, where G {0, 1} is defined by: 

1 if 3 i G {0, . . . , m — 1} and 3 fc such that Wk ■ ■ ■ Wk+ n 
B j = \ = Idx- 1 {blQ{j,m)A)...Idx- 1 {blO{j J m).n) -a,,. 
otherwise. 

for all j, < j < m n — 1, and 610(j, m) is a tuple of length n 
denoting the conversion of j from base 10 to base m, such that 
Len(bl0(j, m)) = n and 610(j, m).z is the i-th digit (from left to 
right) of this conversion. 

3. EAHn(u>).3 = C = C d . . . C m -i, where Q = C®C} . . . Cf" -1 , for 
alH, < i < m — 1, and C\ G {0, 1, A} is defined by: 

1 if Bj = 1 and 3 k G {1, . . . , h — n} such that w k . . . Wk+ n 

= Idx- 1 (bl0(j J m)A))...Idx- 1 {bl0(j,m).n) ■ a*. 
if i?j = 1 and |fcg{l,...,/t-n} such that iw^ . . . Wfc+ n 

= /dar-^ftlOO", m).l)) . . . Mc^lOC?, m).n) • 
A if = 0. 

for all j, < j < m n - 1. It is clear that |C| = m * |{/c | B fc = 1}|. 

4. EAHn(u>).4 = D = DqD^ . . . D m _ x , where A = DfDj . . . D™"' 1 for 
alH, < i < m — 1, and G {0, 1}* is defined by: 

, = j Mbl0b2(Freq(a i ,j)) if C? = 1. 
1 I A ifC?>l. 

where Freq(ai,j) = \{k \ w k . . . w k+n = a bl0{j ^ m)A . . . <T bmj , m) . n (Ti}\. 

Let us denote by Marked the set \ C\ = 1}. The greatest 

element of the set {\bl0b2(Freq(<Ji,j)) \ \ G Marked} is denoted 
by Max. Then, MbWb2(Freq(ai, j)) is defined by: 

M61062 (Freq(ai , j) ) = 00_^_0 61062 (Freq(ai , j) ) 

where t(i,j) = Max - \bl0b2(Freq(ai, 

5. EAHn(u;).5 = E denotes the compression of w using A, B,C, D, 
adaptive codes of order n, and the Huffman algorithm. 
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EAHn(string w — u>iw 2 . . . w h G S + , where Wi G S for all i, 


1 < i < ft) 


1. G n (w) := (0, 0,f);A := A; B := A; G := A; D:=\;E := ; 




2. if ft > n then 




begin 




2.1. for j = 1 to ft — n do G n (w).l := G n (u;).l U {uijW J+ i . 


■ Wj + n-l}; 


2.2. for j = n + 1 to ft do G„(w).l := G„(w).l U {wA; 




2.3. for j = 1 to ft — n do G n (w)-2 := G n (w).2 U {(wj . . . u>j+ n -i, Wj+ n )}; 


2.4. if n = 1 then 




begin 




2.4.1. for j = 1 to ft — 1 do 




2.4.2. if wj — Wj+i then G„(u>).2 := G n (w).2 U {(u>j, Wjjaux), (u>j_aux,Wj)}; 


end 




o 5 else for i — r? + 1 to ft — 1 do G (w) 2 •— G Cn;1 2 U -TC 




end 




3. if n = 1 then 




4 for -? — 1 to /) — 1 Ho 




5. if = Wj+i then G n (w).l := Gn(io).l U {wjjaux}; 




6 for each e G G„(w) 2 do fCe) — CO AV 




T. for % — 77; -I - 1 to /i do 




begin 




7.1. if 7i = 1 and if^-i = i/^ then let e <E G n (w).2 be (wi,w 2 


jxux ) ; 


7 9 pIsp lpt, ^ f 7/n 9 bp ^, 1 ?/)„■ V 




7.3. /(e). 1 := /(e). 1 + 1; 




end 




8. for each WjWj+i . . . Wj+ n —i G Vbase do 




begin 




8.1. /tt := (); count. := 0; 




8 9 tunic — S ortcd ( m ttmit^ n w (iiia iti~ i ~ 1 I ] mitnut^ r 




8.3. for i — 1 to LcTi(tuplc^ do /u :— /w <] y {tuple z).l5 




8.4. for i = \ to LiGTiitiiplc) do 




begin 




8.4.1. count :— count + 1; 




8.4.2. f(tuple.i).2 := Huffman(/u). count. 1; 




end 




end 




9. ^ := &10&2(Me(wi)) . . . 61062(/dx(to min(h . n) )); 




10. for j = to m" - 1 do 




11. if (o- bl0( j, m ).i . . . cr(,io(j,m).n, f») £ G„(w).2 then B :— B ■ 


1; else B := B ■ 0; 


12. for i = to m — 1 do 




13. for j = to m" - 1 do 




14. if = 1 then 




begin 




14.1. if n = 1 and i = j then e := (ai, Oijaux); 




14.2. else e := (O"(,10(3,m).l • ■ ■ °~blQ(j,m).n, fi); 




14.3. if e G G„(w).2 then 




14.4. begin G := G ■ 1; D := D ■ M61062(/(e).l); end else G 


:=C-0; 


end 




15. for i = 7i + 1 to ft do 




begin 




15.1. if n — 1 and to 4 _i = to^ then e:=(wi, Wijaux); else e:=( 




15.2. E := E ■ f(e).2; 




end 




16. return (A, B, G, D, £); 





Figure 3. The EAH data compression algorithm. 
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If S is a set of edges, then Sorted(S) denotes a tuple U which 
contains the elements of S, but sorted with a certain algorithm (the 
algorithm is not important here, but must be fixed). 

Let w be an input data string. We denote by H(w) the encoding of 
u using the Huffman algorithm, and by LZ(w) the encoding of w using 
the Lempel-Ziv data compression algorithm [18] . Let us consider the 
following notations. 



• LH{w) = \H(w)\. 

• LLZ{w) = \LZ(w) 



LEAHn(w) = £f =1 \EkHn{w).i\. 



EXAMPLE 3. Let £ = {a, b, c, d, e} be an alphabet, and w a string of 

length 200 over E, given as below (between brackets). 

w= [abedcababedccabedcedcababedcedcccabedcabedcedccababedc 

abedccccedccedccedcababedcabedcedccedcababedcabedccabedcab 

abedcedcccccedcabedcabedccccedcccabedcccedccabedccccabedcc 

ababedcabedcedccabedcababedced]. 

Applying EAH1 to the string w, we get the following adaptive graph 
of order one. 



(31,0) 



(8,0) 



(14,11) 



(22,10) 



(37,0) 



(36,0) 



© 



(c-aux) 



Figure 4- Gi(w). 

It is easy to verify that we obtain: 
LH(w) = 462 LEAHl(w) 



310 



LLZ(w) = 388 



which shows that in this case EAH1 substantially outperforms the well- 
known Lempel-Ziv universal data compression algorithm 



4. Conclusions and Further Work 

Adaptive codes associate variable-length codewords to symbols being 
encoded depending on the previous symbols in the input data string. 
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This class of codes has been introduced in |j as a new class of non- 
standard variable- length codes. New algorithms for data compression 
based on adaptive codes of order one, have been presented in [TU] . 
where we have behaviorally shown that for a large class of input data 
strings, these algorithms substantially outperform the Lempel-Ziv uni- 
versal data compression algorithm 

The EAH encoder has been proposed in as an improved gener- 
alization of these algorithms. The work carried out so far |10| II 1 j has 
behaviorally shown that EAH is a very promising data compression al- 
gorithm. In this paper, we presented a translation of the EAH algorithm 
into the graph theory. Further work in this area will focus on finding 
new improvements for the EAH algorithm, as well as new algorithms 
for data compression based on adaptive codes. 
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